Chi-squared distribution

Probability density function
Cumulative distribution function
Notation \chi^2(k)\! or \chi^2_k\!
Parameters kN1 — degrees of freedom
Support x ∈ [0, +∞)
PDF \frac{1}{2^{k/2}\Gamma(k/2)}\; x^{k/2-1} e^{-x/2}\,
CDF \frac{1}{\Gamma(k/2)}\;\gamma(k/2,\,x/2)
Mean k
Median \approx k\bigg(1-\frac{2}{9k}\bigg)^3
Mode max{ k − 2, 0 }
Variance 2k
Skewness \scriptstyle\sqrt{8/k}\,
Ex. kurtosis 12 / k
Entropy \frac{k}{2}\!%2B\!\ln(2\Gamma(k/2))\!%2B\!(1\!-\!k/2)\psi(k/2)
MGF (1 − 2 t)k/2   for  t  < ½
CF (1 − 2 it)k/2      [1]

In probability theory and statistics, the chi-squared distribution (also chi-square or χ²-distribution) with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables. It is one of the most widely used probability distributions in inferential statistics, e.g., in hypothesis testing or in construction of confidence intervals.[2][3][4][5] When there is a need to contrast it with the noncentral chi-squared distribution, this distribution is sometimes called the central chi-squared distribution.

The chi-squared distribution is used in the common chi-squared tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. Many other statistical tests also use this distribution, like Friedman's analysis of variance by ranks.

The chi-squared distribution is a special case of the gamma distribution.

Contents

Definition

If Z1, ..., Zk are independent, standard normal random variables, then the sum of their squares,


    Q\ = \sum_{i=1}^k Z_i^2 ,

is distributed according to the chi-squared distribution with k degrees of freedom. This is usually denoted as


    Q\ \sim\ \chi^2(k)\ \ \text{or}\ \ Q\ \sim\ \chi^2_k .

The chi-squared distribution has one parameter: k — a positive integer that specifies the number of degrees of freedom (i.e. the number of Zi’s)

Characteristics

Further properties of the chi-squared distribution can be found in the box at the upper right corner of this article.

Probability density function

The probability density function (pdf) of the chi-squared distribution is


f(x;\,k) =
\begin{cases}
  \frac{1}{2^{k/2}\Gamma(k/2)}\,x^{k/2 - 1} e^{-x/2},  & x \geq 0; \\ 0, & \text{otherwise}.
\end{cases}

where Γ(k/2) denotes the Gamma function, which has closed-form values at the half-integers.

For derivations of the pdf in the cases of one and two degrees of freedom, see Proofs related to chi-squared distribution.

Cumulative distribution function

Its cumulative distribution function is:


    F(x;\,k) = \frac{\gamma(k/2,\,x/2)}{\Gamma(k/2)} = P(k/2,\,x/2),

where γ(k,z) is the lower incomplete Gamma function and P(k,z) is the regularized Gamma function.

In a special case of k = 2 this function has a simple form:


    F(x;\,2) = 1 - e^{-\frac{x}{2}}.

Tables of this distribution — usually in its cumulative form — are widely available and the function is included in many spreadsheets and all statistical packages. For a closed form approximation for the CDF, see under Noncentral chi-squared distribution.

Additivity

It follows from the definition of the chi-squared distribution that the sum of independent chi-squared variables is also chi-squared distributed. Specifically, if {Xi}i=1n are independent chi-squared variables with {ki}i=1n degrees of freedom, respectively, then Y = X1 + ⋯ + Xn is chi-squared distributed with k1 + ⋯ + kn degrees of freedom.

Information entropy

The information entropy is given by


    H = \int_{-\infty}^\infty f(x;\,k)\ln f(x;\,k) \, dx
      = \frac{k}{2} %2B \ln\big( 2\Gamma(k/2) \big) %2B \big(1 - k/2\big) \psi(k/2),

where ψ(x) is the Digamma function.

The Chi-squared distribution is the maximum entropy probability distribution for a random variate X for which E(X)=\nu is fixed, and E(\ln(X))=\psi\left(\frac{1}{2}\right)%2B\ln(2) is fixed. [6]

Noncentral moments

The moments about zero of a chi-squared distribution with k degrees of freedom are given by[7][8]


    \operatorname{E}(X^m) = k (k%2B2) (k%2B4) \cdots (k%2B2m-2) = 2^m \frac{\Gamma(m%2Bk/2)}{\Gamma(k/2)}.

Cumulants

The cumulants are readily obtained by a (formal) power series expansion of the logarithm of the characteristic function:


    \kappa_n = 2^{n-1}(n-1)!\,k

Asymptotic properties

By the central limit theorem, because the chi-squared distribution is the sum of k independent random variables with finite mean and variance, it converges to a normal distribution for large k. For many practical purposes, for k > 50 the distribution is sufficiently close to a normal distribution for the difference to be ignored.[9] Specifically, if X ~ χ²(k), then as k tends to infinity, the distribution of (X-k)/\sqrt{2k} tends to a standard normal distribution. However, convergence is slow as the skewness is \sqrt{8/k} and the excess kurtosis is 12/k. Other functions of the chi-squared distribution converge more rapidly to a normal distribution. Some examples are:

Related distributions

A chi-squared variable with k degrees of freedom is defined as the sum of the squares of k independent standard normal random variables.

If Y is a k-dimensional Gaussian random vector with mean vector μ and rank k covariance matrix C, then X = (Yμ)TC−1(Yμ) is chi-squared distributed with k degrees of freedom.

The sum of squares of statistically independent unit-variance Gaussian variables which do not have mean zero yields a generalization of the chi-squared distribution called the noncentral chi-squared distribution.

If Y is a vector of k i.i.d. standard normal random variables and A is a k×k idempotent matrix with rank k−n then the quadratic form YTAY is chi-squared distributed with k−n degrees of freedom.

The chi-squared distribution is also naturally related to other distributions arising from the Gaussian. In particular,

Generalizations

The chi-squared distribution is obtained as the sum of the squares of k independent, zero-mean, unit-variance Gaussian random variables. Generalizations of this distribution can be obtained by summing the squares of other types of Gaussian random variables. Several such distributions are described below.

Chi-squared distributions

Noncentral chi-squared distribution

The noncentral chi-squared distribution is obtained from the sum of the squares of independent Gaussian random variables having unit variance and nonzero means.

Generalized chi-squared distribution

The generalized chi-squared distribution is obtained from the quadratic form z′Az where z is a zero-mean Gaussian vector having an arbitrary covariance matrix, and A is an arbitrary matrix.

Gamma, exponential, and related distributions

The chi-squared distribution X ~ χ²(k) is a special case of the gamma distribution, in that X ~ Γ(k/2, 2) (using the shape parameterization of the gamma distribution) where k/2 is an integer.

Because the exponential distribution is also a special case of the Gamma distribution, we also have that if X ~ χ²(2), then X ~ Exp(1/2) is an exponential distribution.

The Erlang distribution is also a special case of the Gamma distribution and thus we also have that if X ~ χ²(k) with even k, then X is Erlang distributed with shape parameter k/2 and scale parameter 1/2.

Applications

The chi-squared distribution has numerous applications in inferential statistics, for instance in chi-squared tests and in estimating variances. It enters the problem of estimating the mean of a normally distributed population and the problem of estimating the slope of a regression line via its role in Student’s t-distribution. It enters all analysis of variance problems via its role in the F-distribution, which is the distribution of the ratio of two independent chi-squared random variables, each divided by their respective degrees of freedom.

Following are some of the most common situations in which the chi-squared distribution arises from a Gaussian-distributed sample.

Name Statistic
chi-squared distribution \sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2
noncentral chi-squared distribution \sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2
chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i-\mu_i}{\sigma_i}\right)^2}
noncentral chi distribution \sqrt{\sum_{i=1}^k \left(\frac{X_i}{\sigma_i}\right)^2}

Table of χ2 value vs p-value

The p-value is the probability of observing a test statistic at least as extreme in a chi-squared distribution. Accordingly, since the cumulative distribution function (CDF) for the appropriate degrees of freedom (df) gives the probability of having obtained a value less extreme than this point, subtracting the CDF value from 1 gives the p-value. The table below gives a number of p-values matching to χ2 for the first 10 degrees of freedom.

A p-value of 0.05 or less is usually regarded as statistically significant, i.e. the observed deviation from the null hypothesis is significant.

Degrees of freedom (df) χ2 value [11]
1
0.004 0.02 0.06 0.15 0.46 1.07 1.64 2.71 3.84 6.64 10.83
2
0.10 0.21 0.45 0.71 1.39 2.41 3.22 4.60 5.99 9.21 13.82
3
0.35 0.58 1.01 1.42 2.37 3.66 4.64 6.25 7.82 11.34 16.27
4
0.71 1.06 1.65 2.20 3.36 4.88 5.99 7.78 9.49 13.28 18.47
5
1.14 1.61 2.34 3.00 4.35 6.06 7.29 9.24 11.07 15.09 20.52
6
1.63 2.20 3.07 3.83 5.35 7.23 8.56 10.64 12.59 16.81 22.46
7
2.17 2.83 3.82 4.67 6.35 8.38 9.80 12.02 14.07 18.48 24.32
8
2.73 3.49 4.59 5.53 7.34 9.52 11.03 13.36 15.51 20.09 26.12
9
3.32 4.17 5.38 6.39 8.34 10.66 12.24 14.68 16.92 21.67 27.88
10
3.94 4.86 6.18 7.27 9.34 11.78 13.44 15.99 18.31 23.21 29.59
P value (Probability)
0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.01 0.001
Nonsignificant Significant

See also

References

  1. ^ M.A. Sanders. "Characteristic function of the central chi-squared distribution". http://www.planetmathematics.com/CentralChiDistr.pdf. Retrieved 2009-03-06. 
  2. ^ Abramowitz, Milton; Stegun, Irene A., eds. (1965), "Chapter 26", Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York: Dover, pp. 940, ISBN 978-0486612720, MR0167642, http://www.math.sfu.ca/~cbm/aands/page_940.htm .
  3. ^ NIST (2006). Engineering Statistics Handbook - Chi-Squared Distribution
  4. ^ Jonhson, N.L.; S. Kotz, , N. Balakrishnan (1994). Continuous Univariate Distributions (Second Ed., Vol. 1, Chapter 18). John Willey and Sons. ISBN 0-471-58495-9. 
  5. ^ Mood, Alexander; Franklin A. Graybill, Duane C. Boes (1974). Introduction to the Theory of Statistics (Third Edition, p. 241-246). McGraw-Hill. ISBN 0-07-042864-6. 
  6. ^ Park, Sung Y.; Bera, Anil K. (2009). "Maximum entropy autoregressive conditional heteroskedasticity model". Journal of Econometrics (Elsevier): 219–230. http://www.wise.xmu.edu.cn/Master/Download/..%5C..%5CUploadFiles%5Cpaper-masterdownload%5C2009519932327055475115776.pdf. Retrieved 2011-06-02. 
  7. ^ Chi-squared distribution, from MathWorld, retrieved Feb. 11, 2009
  8. ^ M. K. Simon, Probability Distributions Involving Gaussian Random Variables, New York: Springer, 2002, eq. (2.35), ISBN 978-0-387-34657-1
  9. ^ Box, Hunter and Hunter. Statistics for experimenters. Wiley. p. 46. 
  10. ^ Wilson, E.B.; Hilferty, M.M. (1931) "The distribution of chi-squared". Proceedings of the National Academy of Sciences, Washington, 17, 684–688.
  11. ^ Chi-Squared Test Table B.2. Dr. Jacqueline S. McLaughlin at The Pennsylvania State University. In turn citing: R.A. Fisher and F. Yates, Statistical Tables for Biological Agricultural and Medical Research, 6th ed., Table IV

External links